-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add SkyReels V2: Infinite-Length Film Generative Model #11518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It's about time. Thanks. |
Mid-PR questions:
|
@tolgacangoz Thanks for working on this, really cool work so far!
2 and 3. I think in this case, we should have separate implementation of SkyReelsV2 and Wan due to the autoregressive nature of the former. Adding any extra code in Wan might complicate it for readers. Will let @yiyixuxu comment on this though
|
FWIW, I have been successful in using the same T5 encoder for WAN 2.1 for this model just by fiddling with their pipeline:
Then this: I incorporate my bitsandbytes nf4 transformer, their tokenizer and the WAN based T5 encoder:
I need to add this function to the pipeline for the T5 encoder to work:
|
It seems appropriate to me. Only Diffusion Forcing pipelines are different for large models. How are the results with your setting? |
Hi @yiyixuxu @a-r-r-o-w and SkyReels Team @yjp999 @pftq @Langdx @guibinchen ... This PR will be ready for review for |
…mask` based on configuration flag. This change enhances flexibility in model behavior during training and inference.
…ensure consistency and correct functionality.
…sV2TimeTextImageEmbedding`.
…itialization to directly assign the list of SkyReelsV2 components.
…ys convert query, key, and value to `torch.bfloat16`, simplifying the code and improving clarity.
…by adding VAE initialization and detailed prompt for video generation, improving clarity and usability of the documentation.
…and improve formatting in `pipeline_skyreels_v2_diffusion_forcing.py` to enhance code readability and maintainability.
…ine` from 5.0 to 6.0 to enhance video generation quality.
…definition of `SkyReelsV2DiffusionForcingPipeline` to ensure consistency and improve video generation quality.
…peline` to default to `None`.
…odel` to *ensure* correct tensor operations.
…peat_interleave` for improved efficiency in `SkyReelsV2Transformer3DModel`.
Colocates the `SkyReelsV2Timesteps` class with the SkyReelsV2 transformer model. This change moves model-specific timestep embedding logic from the general embeddings module to the transformer's own file, improving modularity and making the model more self-contained.
Replaces manual parameter iteration with the `get_parameter_dtype` helper to determine the time embedder's data type. This change improves code readability and centralizes the logic.
Or, I think they can stay as a meaning of placeholder or potential feature, because the original code was the one that I cannot produce good results with 1.3B models for FLF2V. Or, it was I who couldn't run this task properly, idk :S. Maybe it is OK with larger models. I think this PR is well-suited for its job for integration. Edit: I opened an issue at the original repo about this. I forgot to open earlier, sry 🥲. |
@tolgacangoz |
Deletes the `FlowMatchUniPCMultistepScheduler` as it is no longer being used.
Removes the `FlowMatchUniPCMultistepScheduler` and integrates its functionality into the existing `UniPCMultistepScheduler`. This consolidation is achieved by using the `use_flow_sigmas=True` parameter in `UniPCMultistepScheduler`, simplifying the scheduler API and reducing code duplication. All usages, documentation, and tests are updated accordingly.
Updates the variable name from `pipe` to `pipeline` across all SkyReels V2 documentation examples. This change improves clarity and consistency.
…ross SkyReels-V2 files
…initialization across SkyReels test files
The `generator` parameter is not used by the scheduler's `step` method within the SkyReelsV2 diffusion forcing pipelines. This change removes the unnecessary argument from the method call for code clarity and consistency.
…'s dtype in SkyReelsV2TimeTextImageEmbedding
Replaces manual parameter iteration with the `get_parameter_dtype` helper.
Adds a check to ensure the `_keep_in_fp32_modules` attribute exists on a parameter before it is accessed. This prevents a potential `AttributeError`, making the utility function more robust when used with models that do not define this attribute.
This will be my 3. pipeline contribution, yay 🥳! |
@@ -168,6 +168,8 @@ class UniPCMultistepScheduler(SchedulerMixin, ConfigMixin): | |||
use_beta_sigmas (`bool`, *optional*, defaults to `False`): | |||
Whether to use beta sigmas for step sizes in the noise schedule during the sampling process. Refer to [Beta | |||
Sampling is All You Need](https://huggingface.co/papers/2407.12173) for more information. | |||
use_flow_sigmas (`bool`, *optional*, defaults to `False`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tolgacangoz ohh this cannot be the only change in scheduler, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh it's already in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the output quality match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outputs are qualitatively/visibly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Thanks for the opportunity to fix #11374!
Original Work
Original repo: https://github.com/SkyworkAI/SkyReels-V2
Paper: https://huggingface.co/papers/2504.13074
TODOs:
✅
SkyReelsV2Transformer3DModel
: 90%WanTransformer3DModel
✅
SkyReelsV2DiffusionForcingPipeline
✅
SkyReelsV2DiffusionForcingImageToVideoPipeline
: Includes FLF2V.✅
SkyReelsV2DiffusionForcingVideoToVideoPipeline
: Extends a given video.✅
SkyReelsV2Pipeline
✅
SkyReelsV2ImageToVideoPipeline
: Includes FLF2V.✅
scripts/convert_skyreelsv2_to_diffusers.py
tolgacangoz/SkyReels-V2-Diffusers
⏳ Did you make sure to update the documentation with your changes? Did you write any new necessary tests?: We will construct these during review.
T2V with Diffusion Forcing (OLD)
diffusers
integrationoriginal_0_short.mp4
diffusers_0_short.mp4
diffusers
integrationoriginal_37_short.mp4
diffusers_37_short.mp4
diffusers
integrationoriginal_0_long.mp4
diffusers_0_long.mp4
diffusers
integrationoriginal_37_long.mp4
diffusers_37_long.mp4
I2V with Diffusion Forcing (OLD)
prompt
="A penguin dances."diffusers
integrationi2v-short.mp4
FLF2V with Diffusion Forcing (OLD)
Now, Houston, we have a problem.
I have been unable to produce good results with this task. I tried many hyperparameter combinations with the original code.
The first frame's latent (
torch.Size([1, 16, 1, 68, 120])
) is overwritten onto the first of25
frame latents oflatents
(torch.Size([1, 16, 25, 68, 120])). Then, the last frame's latent is concatenated, thuslatents
istorch.Size([1, 16, 26, 68, 120])
. After the denoising process, the length of the last frame latent is discarded at the end and then decoded by the VAE. I tried not concatenating the last frame but overwriting onto the latest frame oflatents
and not discarding the latest frame latent at the end, but still got bad results. Here are some results:0.mp4
1.mp4
2.mp4
3.mp4
4.mp4
5.mp4
6.mp4
7.mp4
V2V with Diffusion Forcing (OLD)
This pipeline extends a given video.
diffusers
integrationvideo1.mp4
v2v.mp4
Firstly, I want to congratulate you on this great work, and thanks for open-sourcing it, SkyReels Team! This PR proposes an integration of your model.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@yiyixuxu @a-r-r-o-w @linoytsaban @yjp999 @Howe2018 @RoseRollZhu @pftq @Langdx @guibinchen @qiudi0127 @nitinmukesh @tin2tin @ukaprch @okaris